55 research outputs found

    Deterministic 1-k routing on meshes with applications to worm-hole routing

    Get PDF
    In 11-kk routing each of the n2n^2 processing units of an n×nn \times n mesh connected computer initially holds 11 packet which must be routed such that any processor is the destination of at most kk packets. This problem reflects practical desire for routing better than the popular routing of permutations. 11-kk routing also has implications for hot-potato worm-hole routing, which is of great importance for real world systems. We present a near-optimal deterministic algorithm running in \sqrt{k} \cdot n / 2 + \go{n} steps. We give a second algorithm with slightly worse routing time but working queue size three. Applying this algorithm considerably reduces the routing time of hot-potato worm-hole routing. Non-trivial extensions are given to the general ll-kk routing problem and for routing on higher dimensional meshes. Finally we show that kk-kk routing can be performed in \go{k \cdot n} steps with working queue size four. Hereby the hot-potato worm-hole routing problem can be solved in \go{k^{3/2} \cdot n} steps

    Sample sort on meshes

    Get PDF
    This paper provides an overview of lower and upper bounds for mesh-connected processor networks. Most attention goes to routing and sorting problems, but other problems are mentioned as well. Results from 1977 to 1995 are covered. We provide numerous results, references and open problems. The text is completed with an index. This is a worked-out version of the author's contribution to a joint paper with Grammatikakis, Hsu and Kraetzl on multicomputer routing, submitted to JPDC

    A powerful heuristic for telephone gossiping

    Get PDF
    A refined heuristic for computing schedules for gossiping in the telephone model is presented. The heuristic is fast: for a network with n nodes and m edges, requiring R rounds for gossiping, the running time is O(R n log(n) m) for all tested classes of graphs. This moderate time consumption allows to compute gossiping schedules for networks with more than 10,000 PUs and 100,000 connections. The heuristic is good: in practice the computed schedules never exceed the optimum by more than a few rounds. The heuristic is versatile: it can also be used for broadcasting and more general information dispersion patterns. It can handle both the unit-cost and the linear-cost model. Actually, the heuristic is so good, that for CCC, shuffle-exchange, butterfly de Bruijn, star and pancake networks the constructed gossiping schedules are better than the best theoretically derived ones. For example, for gossiping on a shuffle-exchange network with 2^{13} PUs, the former upper bound was 49 rounds, while our heuristic finds a schedule requiring 31 rounds. Also for broadcasting the heuristic improves on many formerly known results. A second heuristic, works even better for CCC, butterfly, star and pancake networks. For example, with this heuristic we found that gossiping on a pancake network with 7! PUs can be performed in 15 rounds, 2 fewer than achieved by the best theoretical construction. This second heuristic is less versatile than the first, but by refined search techniques it can tackle even larger problems, the main limitation being the storage capacity. Another advantage is that the constructed schedules can be represented concisely

    Towards practical permutation routing on meshes

    Get PDF
    We consider the permutation routing problem on two-dimensional n×nn \times n meshes. To be practical, a routing algorithm is required to ensure very small queue sizes QQ, and very low running time TT, not only asymptotically but particularly also for the practically important nn up to 10001000. With a technique inspired by a scheme of Kaklamanis/Krizanc/Rao, we obtain a near-optimal result: T=2⋅n+O(1)T = 2 \cdot n + {\cal O}(1) with Q=2Q = 2. Although QQ is very attractive now, the lower order terms in TT make this algorithm highly impractical. Therefore we present simple schemes which are asymptotically slower, but have TT around 3⋅n3 \cdot n for {\em all} nn and QQ between 2 and 8

    Vertex labeling and routing in expanded Apollonian networks

    Get PDF
    We present a family of networks, expanded deterministic Apollonian networks, which are a generalization of the Apollonian networks and are simultaneously scale-free, small-world, and highly clustered. We introduce a labeling of their vertices that allows to determine a shortest path routing between any two vertices of the network based only on the labels.Comment: 16 pages, 2 figure

    External Selection

    No full text
    Sequential selection has been solved in linear time by Blum e.a. Running this algorithm on a problem of size NN with N>MN > M, the size of the main-memory, results in an algorithm that reads and writes \go{N} elements, while the number of comparisons is also bounded by \go{N}. This is asymptotically optimal, but the constants are so large that in practice sorting is faster for most values of MM and NN. This paper provides the first detailed study of the external selection problem. A randomized algorithm of a conventional type is close to optimal in all respects. Our deterministic algorithm is more or less the same, but first the algorithm builds an index structure of all the elements. This effort is not wasted: the index structure allows the retrieval of elements so that we do not need a second scan through all the data. This index structure can also be used for repeated selections, and can be extended over time. For a problem of size NN, the deterministic algorithm reads N+o(N)N + o(N) elements and writes only o(N)o(N) elements and is thereby optimal to within lower-order terms

    Ultimate Parallel List Ranking?

    No full text
    Two improved list-ranking algorithms are presented. The ``peeling-off'' algorithm leads to an optimal PRAM algorithm, but was designed with application on a real parallel machine in mind. It is simpler than earlier algorithms, and in a range of problem sizes, where previously several algorithms where required for the best performance, now this single algorithm suffices. If the problem size is much larger than the number of available processors, then the ``sparse-ruling-sets'' algorithm is even better. In previous versions this algorithm had very restricted practical application because of the large number of communication rounds it was performing. This weakness is overcome by adding two new ideas, each of which reduces the number of communication rounds by a factor of two

    Better Trade-offs for Parallel List Ranking

    No full text
    An earlier parallel list ranking algorithm performs well for problem sizes NN that are extremely large in comparison to the number of PUs PP. However, no existing algorithm gives good performance for reasonable loads. We present a novel family of algorithms, that achieve a better trade-off between the number of start-ups and the routing volume. We have implemented them on an Intel Paragon, and they turn out to considerably outperform all earlier algorithms: with P=2P = 2 the sequential algorithm is already beaten for N = \mbox{25,000}; for P=100P = 100 and N=107N = 10^7, the speed-up is 21, and for N=108N = 10^8 it even reaches 30. A modification of one of our algorithms solves a theoretical question: we show that on one-dimensional processor arrays, list ranking can be solved with a number of steps equal to the diameter of the network

    From parallel to external list ranking

    No full text
    Novel algorithms are presented for parallel and external memory list-ranking. The same algorithms can be used for computing basic tree functions, such as the depth of a node. The parallel algorithm stands out through its low memory use, its simplicity and its performance. For a large range of problem sizes, it is almost as fast as the fastest previous algorithms. On a Paragon with 100 PUs, each holding 10^6 nodes, we obtain speed-up 25. For external-memory list-ranking, the best algorithm so far is an optimized version of independent-set-removal. Actually, this algorithm is not good at all: for a list of length N, the paging volume is about 72 N. Our new algorithm reduces this to 18 N. The algorithm has been implemented, and the theoretical results are confirmed
    • …
    corecore